Class Project: Analysis of the S&P 500 to show historical trends on when to make the most profit¶
David Meduga S&DS 123
Introduction¶
The "The S&P500 is a market-capitalization-weighted index of 500 leading publicly traded companies in the U.S. The index actually has 503 components because three of them have two share classes listed."(investopedia) The S&P500 is a leading indicator in the finanical world that tells the world about how the economy is doing and the financial wellbeing of the top 500 companies. The object of this project is to make a analysis of the S&P500 and based on this data show investing strageties where it was most profitable to make a trade and either buy long or sell short. In my Final Project, I will be coding the adjusted returns from the S&P 500, taking the growth rate of specific stocks, use the top 9 stocks that makes up a majority of the S&P500 to show there growth over time, and using a basic investment strategy to show when it was the time to optimally purchase or sell a indidual stock in the S&P500.
The project focus is on the Financial world because the financial world encompases everything about the economy and a impact in the stock market is usually due to a world problem that causes a serious problem in the economy. Being able to track the stock market and see the changes in real time is staying ahead of the game and having this information is very valuable. This information leads to profits, real world impacts, and innovation. This is why I am doing a analysis on the S&P500 to see the historical trends and make trading decisions based on previous data.
Click on this github page to interact the interactive graphs https://phonixfire01.github.io/YDATA_PROJECT/Final_Project.html
Where you got the data from, including a link to the website where you got the data if applicable.¶
I obtained the data from the python package yfinance which is from yahoo finance. I used the link https://en.wikipedia.org/wiki/List_of_S%26P_500_companies to get the list of the S&P500 companies and there Ticker Symbols.
What other analyses have already been done with the data, and possibly links to other analyses.¶
The other analyses that have been done with the S&P 500 are numerous. Here is a link to github https://github.com/topics/financial-analysis?l=python for 314 public records for others that have done similar analysis. There have been many other analysis of the the S&P500 ranging from corporations such as J.P. Morgan, Citidel, DRW Trading group. Each group has there own unique way of analysing the data and comes up with different conclusions on the most profitable trading strageties. They each use different techinques to bring about different amounts of profit and have a ranging varity of teams decitated to research the next stragety.
Data Wrangling¶
I am using adj_close = adjusted close of the stock, which is the closing price after adjustments for all applicable splits and dividend distributions, df= dataframe of the downloaded data from yfinance, fig = showing interavtive plots, top9stocks= top 9 stock in the S&P500, and rets = return rate, which is the net gain or loss of an investment over a specified time period, expressed as a percentage of the investment's initial cost. All other equations functions are explained as needed above the code.
Step 1 Downloading Yahoo Finance and Other Packages¶
#run the pip install if you haven't already installed the data
#pip install yfinance
import pandas as pd
import yfinance as yf
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
from datetime import datetime
%matplotlib inline
import pandas_datareader as web
Step 2 Web Scraping from Wikipedia to see all companies inside the S&P500¶
#The Wikipedia URL of the S&P 500
sp_wiki_url = "https://en.wikipedia.org/wiki/List_of_S%26P_500_companies"
# Reading the HTML table into a list of DataFrames
sp_wiki_df_list = pd.read_html(sp_wiki_url)
# Only getting the information needed containing the company information
sp_df = sp_wiki_df_list[0]
# Extract the ticker symbol columns
sp_ticker_list = list(sp_df['Symbol'].values)
#now creating a dataframe from yahoo finance to extract the rest of the stocks data over a 44 year period
df = yf.download(sp_ticker_list, start='1980-01-01', end='2024-01-01')
[*********************100%%**********************] 503 of 503 completed
4 Failed downloads:
['BF.B']: Exception('%ticker%: No price data found, symbol may be delisted (1d 1980-01-01 -> 2024-01-01)')
['SOLV', 'GEV']: Exception("%ticker%: Data doesn't exist for startDate = 315550800, endDate = 1704085200")
['BRK.B']: Exception('%ticker%: No timezone found, symbol may be delisted')
#Getting the Adjusted close for each Stock
adj_close = df['Adj Close']
#putting it into a colum
adj_close.columns
Index(['A', 'AAL', 'AAPL', 'ABBV', 'ABNB', 'ABT', 'ACGL', 'ACN', 'ADBE', 'ADI',
...
'WTW', 'WY', 'WYNN', 'XEL', 'XOM', 'XYL', 'YUM', 'ZBH', 'ZBRA', 'ZTS'],
dtype='object', name='Ticker', length=503)
#pulling the S&P500 from yahoo finance and then using an interactive plot to show the adj close over time
sp_ticker_list = ['^GSPC']
df = yf.download(sp_ticker_list, start='1980-01-01', end='2024-01-01')
#An interactive plot using Plotly
fig = px.line(df, x=df.index, y='Adj Close', title='S&P500 Adjusted Close Price')
fig.update_layout(
xaxis_title='Date',
yaxis_title='Price',
height=800,
width=1200)
fig.show()
[*********************100%%**********************] 1 of 1 completed
This graph illustrates the historical performance of the S&P 500 index over time, reflecting its fluctuations in value across a 40 year period. Understanding why these movements is crucial for finding the right time to invest and maximize returns from the S&P 500. The graph highlights periods of both growth and decline in the index's value, offering insights into the factors driving these fluctuations.¶
To explore the events that makes these fluctuations exist, let's explore the reasons contributing to the rises and dips observed in the graph. By analyzing factors influencing the financial markets, we can better comprehend the underlying drivers shaping the performance of the S&P 500.¶
Real World Impacts on the S&P500¶
df = yf.download("^GSPC", start = '1980-01-01')
[*********************100%%**********************] 1 of 1 completed
#Historical Financial crashes
important_dates = {
'The Oil Crisis': '1982-04-29',
'The Tech Bubble': '2000-09-11',
'Financial Crisis and Great Recession': '2007-10-12',
'Covid 19 Pandemic': '2020-03-20',
}
#using the interactive graph from earlier and putting a line when the financial Crashes happened
fig = go.Figure(data = [go.Candlestick(
x = df.index,
open = df['Open'],
close = df['Close'],
high = df['High'],
low =df['Low'],)])
fig.update_layout(
title= "S&P500 Shocks",
yaxis_title= "S&P500 Stock",
shapes = [
dict(x0 = '1982-04-29', x1= '1982-04-29', y0=0, y1=1, xref = 'x', yref= 'paper', line_width = 2),
dict(x0 = '2000-09-11', x1= '2000-09-11', y0=0, y1=1, xref = 'x', yref= 'paper', line_width = 2),
dict(x0 = '2007-10-12', x1= '2007-10-12', y0=0, y1=1, xref = 'x', yref= 'paper', line_width = 2),
dict(x0 = '2020-03-20', x1= '2020-03-20', y0=0, y1=1, xref = 'x', yref= 'paper', line_width = 2)],
annotations = [
dict(x= '1982-04-29', y = 1.1 , xref ='x', yref = 'paper', showarrow = False,
xanchor = 'left', text = 'The Oil Crisis'),
dict(x= '2000-09-11', y = 1.1, xref ='x', yref = 'paper', showarrow = False,
xanchor = 'left', text = 'The Tech Bubble'),
dict(x= '2007-10-12', y = 1.1, xref ='x', yref = 'paper', showarrow = False,
xanchor = 'left', text = 'Financial Crisis and Great Recession'),
dict(x= '2020-03-20', y = 1.1, xref ='x', yref = 'paper', showarrow = False,
xanchor = 'left', text = 'Covid 19 Pandemic')])
fig.update_layout(xaxis_rangeslider_visible= False)
This graph provides a historical perspective on how global events have profoundly influenced the performance of the S&P 500. It vividly depicts how the index responds during periods of recession or economic downturns, often experiencing downward shifts that persist for extended durations.¶
By studying the graph, investors can discern crucial insights into when to strategically enter or exit the market, aligning their investment choices with economic conditions. It marks the importance of conducting thorough analysis and considering broader macroeconomic trends when navigating the complexities of investing in the financial markets.¶
Now Let's view the top stocks that make up a large portion of the S&P500¶
top9stocks = dict(
AAPL = "Apple stock", AMZN = 'Amazon Stock', NVDA = 'NVIDIA', GOOGL = 'Alphabet Class A',TSLA = 'Tesla', GOOG = 'Alphabet Class C',
META = 'Meta Platforms Class A', MSFT = 'Miscroft Stock ', UNH = 'United Health Group')
list(top9stocks.keys())
['AAPL', 'AMZN', 'NVDA', 'GOOGL', 'TSLA', 'GOOG', 'META', 'MSFT', 'UNH']
df_top9 = yf.download(list(top9stocks.keys()))
[*********************100%%**********************] 9 of 9 completed
adj_close_top9 = df_top9['Adj Close']
adj_close_top9 = adj_close_top9.dropna()
Top 9 Stocks Adjusted Close Over Time¶
adj_close_top9.dropna().plot(subplots = True, figsize = (14,7));
This graph illustrates the average increase in stock value over the past decade, providing insight into how top-performing stocks generally track the movements of the S&P 500 index.¶
Now Let's clean up the data¶
#A clean version of all the stock and there respected name
for key, value in top9stocks.items():
print(f"{key:8s} | {value}")
AAPL | Apple stock AMZN | Amazon Stock NVDA | NVIDIA GOOGL | Alphabet Class A TSLA | Tesla GOOG | Alphabet Class C META | Meta Platforms Class A MSFT | Miscroft Stock UNH | United Health Group
#rounding to the scond digit
adj_close_top9.describe().round(2)
| Ticker | AAPL | AMZN | GOOG | GOOGL | META | MSFT | NVDA | TSLA | UNH |
|---|---|---|---|---|---|---|---|---|---|
| count | 3007.00 | 3007.00 | 3007.00 | 3007.00 | 3007.00 | 3007.00 | 3007.00 | 3007.00 | 3007.00 |
| mean | 70.65 | 77.25 | 64.60 | 64.67 | 165.91 | 135.72 | 105.68 | 84.64 | 237.21 |
| std | 58.65 | 55.21 | 40.63 | 40.04 | 102.71 | 111.93 | 159.98 | 105.32 | 159.89 |
| min | 11.98 | 10.41 | 13.92 | 13.99 | 17.71 | 21.51 | 2.61 | 1.74 | 42.57 |
| 25% | 24.16 | 21.35 | 29.24 | 29.73 | 82.06 | 40.38 | 5.27 | 14.35 | 100.45 |
| 50% | 41.17 | 77.85 | 53.50 | 53.85 | 159.25 | 90.59 | 43.06 | 20.20 | 212.45 |
| 75% | 129.51 | 123.54 | 96.09 | 95.64 | 212.84 | 234.75 | 138.29 | 180.94 | 380.82 |
| max | 197.86 | 189.05 | 173.69 | 171.95 | 527.34 | 429.37 | 950.02 | 409.97 | 548.93 |
#cleaning all the data to show only what we want to see
adj_close_top9.aggregate(['min', 'mean', 'std', 'median', 'max']).round(2)
| Ticker | AAPL | AMZN | GOOG | GOOGL | META | MSFT | NVDA | TSLA | UNH |
|---|---|---|---|---|---|---|---|---|---|
| min | 11.98 | 10.41 | 13.92 | 13.99 | 17.71 | 21.51 | 2.61 | 1.74 | 42.57 |
| mean | 70.65 | 77.25 | 64.60 | 64.67 | 165.91 | 135.72 | 105.68 | 84.64 | 237.21 |
| std | 58.65 | 55.21 | 40.63 | 40.04 | 102.71 | 111.93 | 159.98 | 105.32 | 159.89 |
| median | 41.17 | 77.85 | 53.50 | 53.85 | 159.25 | 90.59 | 43.06 | 20.20 | 212.45 |
| max | 197.86 | 189.05 | 173.69 | 171.95 | 527.34 | 429.37 | 950.02 | 409.97 | 548.93 |
Return Rate Graph for Top 9 Stock in the S&P500¶
#each graph indidually
rets.cumsum().apply(np.exp).plot(subplots = True, figsize = (14,7));
This visualizes the average returns of the stock over the past decade, offering valuable insights into the profitability of investing in these stocks compared to alternative options.¶
Now lets look at the data by week and year¶
adj_close_top9.resample('1w', label = 'right').last().head()
| Ticker | AAPL | AMZN | GOOG | GOOGL | META | MSFT | NVDA | TSLA | UNH |
|---|---|---|---|---|---|---|---|---|---|
| Date | |||||||||
| 2012-05-20 | 16.036407 | 10.6925 | 14.953949 | 15.025025 | 38.189480 | 23.528460 | 2.770253 | 1.837333 | 44.901157 |
| 2012-05-27 | 17.001230 | 10.6445 | 14.733027 | 14.803053 | 31.876179 | 23.359650 | 2.843636 | 1.987333 | 46.672581 |
| 2012-06-03 | 16.961918 | 10.4110 | 14.221195 | 14.288789 | 27.690619 | 22.869312 | 2.747319 | 1.876667 | 45.774391 |
| 2012-06-10 | 17.546381 | 10.9240 | 14.457061 | 14.525776 | 27.071278 | 23.833920 | 2.779426 | 2.005333 | 48.236095 |
| 2012-06-17 | 17.359217 | 10.9175 | 14.060049 | 14.126877 | 29.978193 | 24.131342 | 2.818411 | 1.994000 | 49.165634 |
# Resampling the data to every month the data will show
adj_close_top9.resample('1m', label = 'right').first().head()
| Ticker | AAPL | AMZN | GOOG | GOOGL | META | MSFT | NVDA | TSLA | UNH |
|---|---|---|---|---|---|---|---|---|---|
| Date | |||||||||
| 2012-05-31 | 16.036407 | 10.6925 | 14.953949 | 15.025025 | 38.189480 | 23.528460 | 2.770253 | 1.837333 | 44.901157 |
| 2012-06-30 | 16.961918 | 10.4110 | 14.221195 | 14.288789 | 27.690619 | 22.869312 | 2.747319 | 1.876667 | 45.774391 |
| 2012-07-31 | 17.915251 | 11.4660 | 14.457560 | 14.526276 | 30.737387 | 24.565416 | 3.084429 | 2.026667 | 46.961945 |
| 2012-08-31 | 18.347324 | 11.6045 | 15.757935 | 15.832833 | 20.857868 | 23.640997 | 3.070669 | 1.750000 | 42.746559 |
| 2012-09-30 | 20.495808 | 12.3940 | 16.962421 | 17.043043 | 17.711208 | 24.590597 | 3.045444 | 1.876000 | 45.542912 |
# Newly resampled graph to show every month return rate
rets.cumsum().apply(np.exp).resample("1m", label = "right").last().plot(figsize = (10,5));
This graph demonstrates the significance of timing investments across various months and years, highlighting how exponential stock returns can be affected by strategic entry points.¶
Let's dive into an analysis into Amazon Stock¶
#dropping the NA's
df = yf.download(list(top9stocks.keys()))
df = df['Open'].dropna()
[*********************100%%**********************] 9 of 9 completed
#The most recent data
df.tail()
| Ticker | AAPL | AMZN | GOOG | GOOGL | META | MSFT | NVDA | TSLA | UNH |
|---|---|---|---|---|---|---|---|---|---|
| Date | |||||||||
| 2024-04-25 | 169.529999 | 169.679993 | 153.360001 | 151.330002 | 421.399994 | 394.029999 | 788.679993 | 158.960007 | 488.959991 |
| 2024-04-26 | 169.880005 | 177.800003 | 175.990005 | 174.369995 | 441.459991 | 412.170013 | 838.179993 | 168.850006 | 492.000000 |
| 2024-04-29 | 173.369995 | 182.750000 | 170.770004 | 169.059998 | 439.559998 | 405.250000 | 875.950012 | 188.419998 | 495.709991 |
| 2024-04-30 | 173.330002 | 181.089996 | 167.380005 | 165.610001 | 431.049988 | 401.489990 | 872.400024 | 186.979996 | 488.959991 |
| 2024-05-01 | 169.580002 | 181.639999 | 166.179993 | 164.300003 | 428.600006 | 392.609985 | 850.770020 | 182.000000 | 479.260010 |
#now let look at Amazon Stock
symbol = 'AMZN'
window = 20
df['min'] = df[symbol].rolling(window=window).min()
df['mean'] = df[symbol].rolling(window=window).mean()
df['std'] = df[symbol].rolling(window=window).std()
df['median'] = df[symbol].rolling(window=window).median()
df['max'] = df[symbol].rolling(window=window).max()
df['ewma'] = df[symbol].ewm(halflife= .5, min_periods = window).mean()
Amazon stock price change over a year¶
ax = df[['min', 'mean', 'max']].iloc[-200:].plot(
figsize = (10,6), style = ['g--', 'r--', 'g--'],
lw=.6
)
df[symbol].iloc[-200:].plot(ax=ax, lw= 2.0);
Over the past year, Amazon's stock witnessed exponential growth, and this graph illustrates the average rate of growth during this period, indicating a positive increase in value¶
Let's try a basic Techinical Analysis Using Simple Moving Average¶
#Using rolling Statistics
df['SMA1'] = df[symbol].rolling(window=42).mean()
df['SMA2'] = df[symbol].rolling(window=252).mean()
df[[symbol, 'SMA1', 'SMA2']].tail()
| Ticker | AMZN | SMA1 | SMA2 |
|---|---|---|---|
| Date | |||
| 2024-04-25 | 169.679993 | 178.930238 | 144.104286 |
| 2024-04-26 | 177.800003 | 179.018809 | 144.393016 |
| 2024-04-29 | 182.750000 | 179.264285 | 144.689008 |
| 2024-04-30 | 181.089996 | 179.456666 | 144.980119 |
| 2024-05-01 | 181.639999 | 179.573095 | 145.284445 |
df.dropna(inplace = True)
#this is 1 is holding long and 2 is short
df['positions'] = np.where(df['SMA1'] > df['SMA2'], 1, -1)
Ploting Simple Moving Average¶
average_ax= df[[symbol, 'SMA1', 'SMA2', 'positions']].plot(figsize= (10,6),
secondary_y = 'positions')
average_ax.get_legend().set_bbox_to_anchor((0.25, 0.85))
#now let's test the investment stragety
import yfinance as yf
def compound_value(investment, symbol, start_date, end_date):
stock_data = yf.download(symbol, start=start_date, end=end_date)
daily_returns = stock_data['Close'].pct_change()
compounded_returns = (1 + daily_returns).cumprod()
compounded_value = investment * compounded_returns.iloc[-1]
return round(compounded_value, 2)
#Example:
investment = 100
symbol = 'AMZN'
start_date = '2015-01-01'
end_date = '2019-01-01'
compound_value = compound_value(investment, symbol, start_date, end_date)
print(f'The compound value of the investment is: {compound_value}')
[*********************100%%**********************] 1 of 1 completed
The compound value of the investment is: 486.83
Wow, the possibility of achieving a 487% gain in just four years through market investments is truly amazing.¶
Let's Compare Two Individual Stocks¶
# Two Stocks competing in the same industry
stock1 = 'AAPL'
stock2 = 'MSFT'
df = yf.download([stock1, stock2])
[*********************100%%**********************] 2 of 2 completed
df.dropna(inplace=True)
Graph of Apple and Miscroft Stock rise Over Time¶
df = df['Adj Close']
# Comparision of the stocks on the same graph
df.loc['2010':'2019'].plot(secondary_y=stock2, figsize= (10,6));
How Correlated are these two stocks?¶
rets.corr()
| Ticker | AAPL | MSFT |
|---|---|---|
| Ticker | ||
| AAPL | 1.000000 | 0.458317 |
| MSFT | 0.458317 | 1.000000 |
#where the data from above shows on a graph with flucations
ax = rets[stock1].rolling(window=252).corr(rets[stock2]).plot(figsize=(10,6))
ax.axhline(rets.corr().iloc[0,1], c = 'r');
Wow, I didn't think think Miscroft and Apple would be correlated by that much but goes to show how similiar two companies in the same industry can be toward one another¶
import plotly.io as pio
pio.renderers.keys()
import plotly.io as pio
pio.renderers.default = 'jupyterlab'
pio.renderers.default = 'notebook'
Conclusion¶
In conclusion, the finding of doing the analysis of the S&P500 include: How different stocks are corelated with one another, how less frequent simple trading strategies actually trades, and how indidual stock moves in a general direction. From this analysis, I know how to use the package yfinance in python and do a deep dive into a stock analysis getting the returns and growth rate over time. This is very helpful in insight into investing in any stock. This code is very verstile is discovering a trend among stocks and can compare stocks to each other to find their corelation rate to diversify their portfolio. I also learned that every substantial dip/loss when analyzing the S&P500 is usually due a substantial world diaster or real world economy problem. Knowing this information being able to stay informed throughout news channels and articles will help me better understand the stock market and know how a certian event will influence the market. This analysis will help others to look at stocks they are interested in and be able to see inforamtion in a orgainzed way to help better the individual investments. Using Python we are able to explore all this data and discover model to optimize our investments.
In exploring the question of when is the right time to invest? The answer now throughout all the data and analysis of the S&P500 is following a simple investing strategy like the Simple moving average model I showed early to make a 486% increase from your orginial investment but through all the research the best time to invest is when the stock is undervalued/ at it's lowest point and you sell once it's at it's highest point. Simple right but no one can ever predict this perfectly. The best we can do is look at historical data and see what the stock has done in the past while looking at current events that helps detemine the stocks price, but never wait for the perfect moment to invest in the stock market because the analysis I did in the invidual stocks show that stocks change frequently so invest as soon as you can. In the S&P500, which is made up of the top 500 performing stocks, has always been an increasing amount since the 1980s and insteading of worrying about the right time to invest and potentially lose out on money, invest now and let the profits rake in.
Reflection¶
This assignment was quite enjoyable because it allowed me to uncover trends that I never knew existed and observe how stocks evolve over time. It was fascinating to realize the practical applications of Python in the real world and how various professionals utilize it in their daily work. For individuals interested in investing, this assignment offered valuable insights into using data to identify optimal times to buy or sell investments. While plotting the data and effectively utilizing the yfinance package presented some challenges, overall, it was a rewarding experience in working with data.
Citations¶
Data Science for Everyone, "Financial Data for Python: yfinance." Youtube, uploaded by Data Science for Everyone, 2021, https://www.youtube.com/watch?v=7wAQCwdvqqo&list=PLlbbWgBRF8EfO4WX13yEWlDUxkHsGPRdV
Data Science for Everyone, "Financial Data with Python: S&P 500 Data." Youtube, uploaded by Data Science for Everyone, 2022, https://www.youtube.com/watch?v=7wAQCwdvqqo&list=PLlbbWgBRF8EfO4WX13yEWlDUxkHsGPRdV&index=1
Data Science for Everyone, "Interactive Financial Plots with Plotly Express: S&P 500 Data." Youtube, uploaded by Data Science for Everyone, 2022, https://www.youtube.com/watch?v=lLYi-L5ptAk&list=PLlbbWgBRF8EfO4WX13yEWlDUxkHsGPRdV&index=19
Data Science for Everyone, "Introduction to Quick Candlestick Plots with Plotly: S&P 500 Data." Youtube, uploaded by Data Science for Everyone, 2022,https://www.youtube.com/watch?v=uidT_mdBzn4&list=PLlbbWgBRF8EfO4WX13yEWlDUxkHsGPRdV&index=20
Data Science for Everyone, "Candlestick Plot with JNJ & COVID Timeline." Youtube, uploaded by Data Science for Everyone, 2022, https://www.youtube.com/watch?v=oWkxWC9bc5Q&list=PLlbbWgBRF8EfO4WX13yEWlDUxkHsGPRdV&index=21
Data Science for Everyone, "Financial Data with Python." Youtube, uploaded by Data Science for Everyone, 2022, https://www.youtube.com/watch?v=jpj71hltkVQ&list=PLlbbWgBRF8EfO4WX13yEWlDUxkHsGPRdV&index=22
Data Science for Everyone, "Rolling Statistics for Financial Data with Python." Youtube, uploaded by Data Science for Everyone, 2022, https://www.youtube.com/watch?v=zYNWZmqR2mI&list=PLlbbWgBRF8EfO4WX13yEWlDUxkHsGPRdV&index=23
Data Science for Everyone, "Correlation Analysis with Financial Data." Youtube, uploaded by Data Science for Everyone, 2022,https://www.youtube.com/watch?v=ulbbzPG6ZHQ&list=PLlbbWgBRF8EfO4WX13yEWlDUxkHsGPRdV&index=24
Dsilva, M. (2022, July 12). Python Stock Analysis for Beginners. Analytics Vidhya. https://www.analyticsvidhya.com/blog/2022/06/python-stock-analysis-for-beginners/
Fervent, "Calculating Stock Returns with Python (Code-along)." Youtube, uploaded by Fervent, 2020,https://www.youtube.com/watch?v=ulbbzPG6ZHQ&list=PLlbbWgBRF8EfO4WX13yEWlDUxkHsGPRdV&index=24
Used google to help with fixing errors and explaining information when unknown
APPENDIX¶
Reviewer name: Sergio Martinez Reviwer email address: sergio.martinez.sam333@yale.edu
Summary Please write a one paragraph (~3-6 sentence) summary of the project here. You should summarize the main goals and findings of the project.
This project focuses on analyzing the S&P500 index to develop effective investment strategies. By examining adjusted returns, top stock growth rates, and basic trading tactics, this project aims to explain how to optimize buying and selling decisions in today's financial world. Through these findings, key insights into stock correlations, infrequent trading strategies, and individual stock behaviors are uncovered. Furthermore, this project provides valuable tools for informed investing which aids in the process of navigating market fluctuations. Ultimately, this project offers organized insights to enhance individual investment decisions, highlighting the significance of data-driven approaches in the financial world.
- Overall strengths and weaknesses Please write 1-2 paragraphs that describe overall what you think the strengths and weakness are of the project. In particular, mention what you found interesting about the results, which analyses/visualizations you found convincing, and what could be done to potentially make the project stronger.
The strengths of this project include how detailed the explanations are with regard to the code. I was able to understand what exactly David wanted me to get from his code; which allowed me to navigate his project with ease. Some weaknesses that I did notice throughout the project were that the visualizations could have benefitted from written explanations, allowing readers to gain a deeper insight into what was trying to be visualized. Furthermore, finding a way to shorten this project could also be of use, given that a more concise project might be able to get your message across better to readers. Otherwise, your project is really insightful, and I enjoyed learning about the S&P500 index :) 2. Major revisions Add bullet points of items that you think should definitely be changed for the final submission of the project.
Making your project fit within the page limit (6 to 10 pages) Putting the visualizations that didn't show up on a GitHub and including a link on your project could also be of some benefit! 3. Minor revisions Add bullet points of items that are more minor, but that would be good to change for the final submission of the project.
Consider moving some of the code throughout your project to the appendix, so you can meet the 6-10 page limit. Consider describing your data wrangling a bit more. It could be helpful for readers to know what exactly you did to get the data ready for analysis! 3. Rubric score Please write a score for the project based on the project rubric that is on Canvas. For any items where there would be a point deduction, please cut and paste a bullet point for that item in the "Items for ponts take off" section below.
Rubric items where points would be taken off if not addressed: "Did not provide a written description of the insights that the graphs provide" Total score: 88/90
Thomas Snyder Reviwer email address: thomas.snyder@yale.edu
Summary The project aims to analyze the historical trends of the S&P500 index and develop investment strategies based on these insights. Utilizing Python and the yfinance package, the analysis focuses on correlations between stocks, frequency of trading strategies, and individual stock movements. Despite lacking clarity in articulating the specific research question, the project effectively demonstrates technical proficiency in data acquisition and analysis. Findings suggest that understanding market dynamics through data analysis can inform investment decisions, though deeper insights linking analysis results to initial objectives would enhance the project's impact and clarity. package for data acquisition and analysis, showing technical proficiency. The reflection
- Overall strengths and weaknesses The project provides a clear introduction to the S&P500 and its importance, establishing a context for the analysis. The project demonstrates the use of Python and yfinance
section offers personal insights and reflections on the project, showcasing engagement and critical thinking. The introduction lacks clarity in articulating the specific question or hypothesis being addressed by the analysis, which is a key component according to the rubric. Data cleaning and wrangling processes are mentioned but not clearly detailed, making it difficult to assess the rigor of these steps. The conclusions could be strengthened by providing more specific insights drawn from the analysis results, as well as tying them back to the initial question or objective. The analysis of correlations
4/18/24, 2:16 PM reviewer_template
file:///Users/snydert/Downloads/reviewer_template.html 1/2
between different stocks, frequency of trading strategies, and individual stock movements provides interesting insights into market dynamics. The use of Python for data analysis and visualization is convincing, demonstrating the potential for leveraging programming tools in financial analysis. 2. Major revisions Clearly articulate the specific research question or hypothesis driving the analysis in the introduction. Provide more detailed explanations and documentation of the data cleaning and wrangling processes. Strengthen the conclusions by explicitly linking the analysis findings to the initial objectives and providing deeper insights into the implications of the results.
- Minor revisions Improve the organization and clarity of the project report, ensuring smooth transitions between sections. Enhance the visualization quality and clarity, ensuring that all graphs are properly labeled and visually appealing. Consider incorporating statistical analyses or additional modeling techniques to deepen the analysis and provide more robust insights. Rubric Score: 80/90 Items for Points Take Off: Introduction: Did not clearly describe what question the analysis is addressing. Data cleaning: Did not provide a clear explanation of the data cleaning process. Data visualization: Visual appearance of graphs could be improved. Analyses: Analyses do not give clear insights into the question of interest. Conclusions: Conclusions reiterate test results with no significant insight given. Total score: 80/90
Miles Kirkpatrick Reviwer email address: miles.kirkpatrick@yale.edu
Summary Please write a one paragraph (~3-6 sentence) summary of the project here. You should summarize the main goals and findings of the project. David is picking apart the stock market. In-depth and fairly methodically, this project covers not just the broad strokes of the S&P 500 but also individual stocks and company growth metrics. It's an effective primer on the stock market. potentially make the project stronger.
- Overall strengths and weaknesses Please write 1-2 paragraphs that describe overall what you think the strengths and weakness are of the project. In particular, mention what you found interesting about the results, which analyses/visualizations you found convincing, and what could be done to
The strentgh of this project comes from its detail. fiscal data is easy to get lost in, but this project picks out the elements that are important and focuses on them, visualizing them in a variety of ways. All of the visualizations present something unique to the project. the weakness comes from the number of visualizations and the relative lack of transitions. theres a lot of information, and at times i felt lost and unsure why we were moving in one direction or another.
4/17/24, 11:41 PM David Review
file:///Users/mileskirkpatrick/Downloads/David Review (1).html 2/2
Major revisions Add bullet points of items that you think should definitely be changed for the final submission of the project. Look at other analyses more in-depth Explain what we are looking at
Minor revisions Add bullet points of items that are more minor, but that would be good to change for the final submission of the project. label your axes make it clearer what the takeaways are
Rubric score Please write a score for the project based on the project rubric that is on Canvas. For any items where there would be a point deduction, please cut and paste a bullet point for that item in the "Items for ponts take off" section below. INTRO 13/15 DATA CLEANING 15/15 DATA VIS 20/25 ANALYSES 20/25 CONCLUSIONS 13/15 REFLECT 5/5 TOTAL /90 Rubric items where points would be taken off if not addressed: Did not describe what other analyses have been done on the data Cut some of the visualizations, we don't need all of them, although each individually is good Explain more what we are looking at give some solid takeaways, I know you have some good insights you got from this beyond developing your methodology Total score: 76/90
#this is the specific S&P500 data I am pulling to show over time the growth of the index
df = yf.download("^GSPC", start = '1980-01-01')
[*********************100%%**********************] 1 of 1 completed
#this is a interactive Candlestick graph which shows the open, close, high, and low of the S&P500
fig = go.Figure(data = [go.Candlestick(
x = df.index,
open = df['Open'],
close = df['Close'],
high = df['High'],
low =df['Low']
)
])
fig.update_layout(title='Candlestick of the S&P500')
fig.show()
adj_close.head()
| Ticker | A | AAL | AAPL | ABBV | ABNB | ABT | ACGL | ACN | ADBE | ADI | ... | WTW | WY | WYNN | XEL | XOM | XYL | YUM | ZBH | ZBRA | ZTS |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Date | |||||||||||||||||||||
| 2000-01-03 | 43.613018 | NaN | 0.846127 | NaN | NaN | 8.992847 | 1.277778 | NaN | 16.274673 | 28.438276 | ... | NaN | 11.505336 | NaN | 6.977994 | 18.328699 | NaN | 4.680298 | NaN | 25.027779 | NaN |
| 2000-01-04 | 40.281456 | NaN | 0.774790 | NaN | NaN | 8.735912 | 1.270833 | NaN | 14.909401 | 26.999619 | ... | NaN | 11.073115 | NaN | 7.138671 | 17.977631 | NaN | 4.586222 | NaN | 24.666668 | NaN |
| 2000-01-05 | 37.782791 | NaN | 0.786128 | NaN | NaN | 8.719852 | 1.388889 | NaN | 15.204173 | 27.393778 | ... | NaN | 11.659698 | NaN | 7.414118 | 18.957691 | NaN | 4.609740 | NaN | 25.138889 | NaN |
| 2000-01-06 | 36.344158 | NaN | 0.718097 | NaN | NaN | 9.024963 | 1.375000 | NaN | 15.328290 | 26.644884 | ... | NaN | 12.205122 | NaN | 7.345260 | 19.937767 | NaN | 4.570541 | NaN | 23.777779 | NaN |
| 2000-01-07 | 39.372856 | NaN | 0.752113 | NaN | NaN | 9.121320 | 1.451389 | NaN | 16.072983 | 27.393778 | ... | NaN | 11.803775 | NaN | 7.345260 | 19.879253 | NaN | 4.468628 | NaN | 23.513889 | NaN |
5 rows × 503 columns
#this is an interactive candlestick chart to measure when the stock was going up and down in a bigger version
fig.update_layout(
title='Interactive Candlestick Chart',
width=1200, # Set the width of the chart
height=800, # Set the height of the chart
xaxis_rangeslider_visible=False, # Hide the range slider
showlegend=False # Hide the legend
)
annotations = []
for name in important_dates:
annotations.append(dict(x= important_dates[name], y = 1.1 , xref ='x', yref = 'paper', showarrow = False,
xanchor = 'left', text = name))
annotations
[{'x': 'The Oil Crisis',
'y': 1.1,
'xref': 'x',
'yref': 'paper',
'showarrow': False,
'xanchor': 'left',
'text': '1982-04-29'},
{'x': 'The Tech Bubble',
'y': 1.1,
'xref': 'x',
'yref': 'paper',
'showarrow': False,
'xanchor': 'left',
'text': '2000-09-11'},
{'x': 'Financial Crisis and Great Recession',
'y': 1.1,
'xref': 'x',
'yref': 'paper',
'showarrow': False,
'xanchor': 'left',
'text': '2007-10-12'},
{'x': 'Covid 19 Pandemic',
'y': 1.1,
'xref': 'x',
'yref': 'paper',
'showarrow': False,
'xanchor': 'left',
'text': '2020-03-20'}]
for i in range(len(important_dates)):
dict(x0 = '1982-04-29', x1= '1982-04-29', y0=0, y1=1, xref = 'x', yref= 'paper', line_width = 2)
i
3
#the dtype of the stocks
adj_close_top9.info()
<class 'pandas.core.frame.DataFrame'> DatetimeIndex: 3004 entries, 2012-05-18 to 2024-04-26 Data columns (total 9 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 AAPL 3004 non-null float64 1 AMZN 3004 non-null float64 2 GOOG 3004 non-null float64 3 GOOGL 3004 non-null float64 4 META 3004 non-null float64 5 MSFT 3004 non-null float64 6 NVDA 3004 non-null float64 7 TSLA 3004 non-null float64 8 UNH 3004 non-null float64 dtypes: float64(9) memory usage: 234.7 KB
adj_close_top9.mean()
Ticker AAPL 70.546617 AMZN 77.153430 GOOG 64.503532 GOOGL 64.573072 META 165.646974 MSFT 135.464508 NVDA 104.924719 TSLA 84.535851 UNH 236.966149 dtype: float64
adj_close_top9.diff().head()
| Ticker | AAPL | AMZN | GOOG | GOOGL | META | MSFT | NVDA | TSLA | UNH |
|---|---|---|---|---|---|---|---|---|---|
| Date | |||||||||
| 2012-05-18 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 2012-05-21 | 0.934280 | 0.2130 | 0.341470 | 0.343093 | -4.195549 | 0.385839 | 0.048158 | 0.080667 | 1.297390 |
| 2012-05-22 | -0.130308 | -0.1390 | -0.331507 | -0.333083 | -3.026787 | 0.008049 | -0.034399 | 0.135333 | 0.141376 |
| 2012-05-23 | 0.410896 | 0.0975 | 0.215691 | 0.216717 | 0.998940 | -0.522499 | 0.068798 | 0.014667 | -0.299408 |
| 2012-05-24 | -0.158428 | -0.1020 | -0.144458 | -0.145144 | 1.028908 | -0.032160 | -0.075677 | -0.049333 | 0.715233 |
adj_close_top9.pct_change().round(3).head()
| Ticker | AAPL | AMZN | GOOG | GOOGL | META | MSFT | NVDA | TSLA | UNH |
|---|---|---|---|---|---|---|---|---|---|
| Date | |||||||||
| 2012-05-18 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 2012-05-21 | 0.058 | 0.020 | 0.023 | 0.023 | -0.110 | 0.016 | 0.017 | 0.044 | 0.029 |
| 2012-05-22 | -0.008 | -0.013 | -0.022 | -0.022 | -0.089 | 0.000 | -0.012 | 0.071 | 0.003 |
| 2012-05-23 | 0.024 | 0.009 | 0.014 | 0.014 | 0.032 | -0.022 | 0.025 | 0.007 | -0.006 |
| 2012-05-24 | -0.009 | -0.009 | -0.010 | -0.010 | 0.032 | -0.001 | -0.027 | -0.024 | 0.016 |
adj_close_top9.pct_change().mean().plot(kind = "bar", figsize = (5,5));
rets.cumsum().apply(np.exp).plot(figsize = (10,5));
# Resampling the data to every week the data will show
adj_close_top9.resample('1w', label = 'right').last().head()
| Ticker | AAPL | AMZN | GOOG | GOOGL | META | MSFT | NVDA | TSLA | UNH |
|---|---|---|---|---|---|---|---|---|---|
| Date | |||||||||
| 2012-05-20 | 16.036409 | 10.6925 | 14.953949 | 15.025025 | 38.189480 | 23.528460 | 2.770252 | 1.837333 | 44.901146 |
| 2012-05-27 | 17.001226 | 10.6445 | 14.733027 | 14.803053 | 31.876179 | 23.359650 | 2.843637 | 1.987333 | 46.672569 |
| 2012-06-03 | 16.961922 | 10.4110 | 14.221195 | 14.288789 | 27.690619 | 22.869308 | 2.747319 | 1.876667 | 45.774391 |
| 2012-06-10 | 17.546377 | 10.9240 | 14.457061 | 14.525776 | 27.071278 | 23.833923 | 2.779425 | 2.005333 | 48.236092 |
| 2012-06-17 | 17.359219 | 10.9175 | 14.060049 | 14.126877 | 29.978193 | 24.131344 | 2.818410 | 1.994000 | 49.165649 |
df3 = yf.download([stock1, stock2])
#comparison the two stock individualy
df3['Adj Close'].plot(subplots=True, figsize = (10,6));
pd.plotting.scatter_matrix(rets,
alpha =.2,
diagonal = 'kde',
hist_kwds = {'bins':35},
figsize = (10,6));
rets.plot(subplots = True, figsize = (10,6));
#shifting the data by 1 row down
adj_close_top9.shift(1)
| Ticker | AAPL | AMZN | GOOG | GOOGL | META | MSFT | NVDA | TSLA | UNH |
|---|---|---|---|---|---|---|---|---|---|
| Date | |||||||||
| 2012-05-18 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 2012-05-21 | 16.036409 | 10.692500 | 14.953949 | 15.025025 | 38.189480 | 23.528463 | 2.770253 | 1.837333 | 44.901150 |
| 2012-05-22 | 16.970686 | 10.905500 | 15.295419 | 15.368118 | 33.993931 | 23.914299 | 2.818410 | 1.918000 | 46.198544 |
| 2012-05-23 | 16.840378 | 10.766500 | 14.963912 | 15.035035 | 30.967144 | 23.922340 | 2.784012 | 2.053333 | 46.339912 |
| 2012-05-24 | 17.251286 | 10.864000 | 15.179603 | 15.251752 | 31.966084 | 23.399841 | 2.852809 | 2.068000 | 46.040520 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 2024-04-25 | 169.020004 | 176.589996 | 161.100006 | 159.130005 | 493.500000 | 409.059998 | 796.770020 | 162.130005 | 487.299988 |
| 2024-04-26 | 169.889999 | 173.669998 | 157.949997 | 156.000000 | 441.380005 | 399.040009 | 826.320007 | 170.179993 | 493.859985 |
| 2024-04-29 | 169.300003 | 179.619995 | 173.690002 | 171.949997 | 443.290009 | 406.320007 | 877.349976 | 168.289993 | 495.350006 |
| 2024-04-30 | 173.500000 | 180.960007 | 167.899994 | 166.149994 | 432.619995 | 402.250000 | 877.570007 | 194.050003 | 489.029999 |
| 2024-05-01 | 170.330002 | 175.000000 | 164.639999 | 162.779999 | 430.170013 | 389.329987 | 864.020020 | 183.279999 | 483.700012 |
3007 rows × 9 columns
df_small = df.loc['2010':'2019']